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Abstract 

Computerized methods have recently shown a great potential in providing radiolo- 
gists with a second opinion about the visual diagnosis of the malignancy of mammo- 
graphic masses. The computer-aided diagnosis (CAD) system we developed for the 
mass characterization is mainly based on a segmentation algorithm and on the neural 
classification of several features computed on the segmented mass. Mass-segmentation 
plays a key role in most computerized systems. Our technique is a gradient-based one, 
showing the main characteristic that no free parameters have been evaluated on the 
dataset used in this analysis, thus it can directly be applied to datasets acquired in 
different conditions without any ad-hoc modification. 

A dataset of 226 masses (109 malignant and 117 benign) has been used in this 
study. The segmentation algorithm works with a comparable efficiency both on ma- 
lignant and benign masses. Sixteen features based on shape, size and intensity of the 
segmented masses are extracted and analyzed by a multi-layered perceptron neural 
network trained with the error back-propagation algorithm. The capability of the sys- 
tem in discriminating malignant from benign masses has been evaluated in terms of 
the receiver-operating characteristic (ROC) analysis. A feature selection procedure has 
been carried out on the basis of the feature discriminating power and of the linear 
correlations interplaying among them. The comparison of the areas under the ROC 
curves obtained by varying the number of features to be classified has shown that 
12 selected features out of the 16 computed ones are powerful enough to achieve the 
best classifier performances. The radiologist assigned the segmented masses to three 
different categories: correctly-, acceptably- and non-acceptably-segmented masses. We 
initially estimated the area under ROC curve only on the first category of segmented 
masses (the 88.5% of the dataset), then extending the classification to the second sub- 
class (reaching the 97.8% of the dataset) and finally to the whole dataset, obtaining 
A z = 0.805 ± 0.030, 0.787 ± 0.024 and 0.780 ± 0.023, respectively. 

Keywords: Computer-aided diagnosis, breast cancer, mammography, image pro- 
cessing, segmentation, neural networks. 



Introduction 

Breast cancer is still one of the most common forms of cancer among women, despite 
earlier detection and more effective treatments have contributed to a significant de- 
crease in the breast-cancer mortality during the last decades [1-4]. Mammography 
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is widely recognized as the most reliable technique for early detection of breast can- 
cers [5,6]. Once a mass is detected on a mammogram, the radiologist recommends 
further investigations, depending on the probability of malignancy he assigns to that 
lesion. However, the characterization of masses from mammographic images is a very 
difficult task and a high number of unnecessary biopsies are actually performed in the 
routine clinical activity. The rate of positive findings for cancers at biopsy ranges from 
15% to 30% [7], i.e. the specificity in differentiating malignant from benign lesions 
on mammographic images is rather low. As a breast biopsy is an invasive and expen- 
sive procedure, methods to improve mammographic specificity without missing cancer 
have to be developed. A higher predictive rate of the mammographic examination can 
be achieved by combining the radiologist's interpretation and the computer analysis. 
Computerized method have recently shown a great potential in assisting radiologists in 
the malignant or benign decision, by providing them with a second opinion about the 
visual diagnosis of the lesion [8-10]. 

The computer-aided diagnosis (CAD) system for characterizing masses described 
in this paper is based on a three-stage algorithm: first, a segmentation technique 
extracts the mass from the image; then, several features based on size and shape of the 
lesion are computed; finally, a neural classifier merges the features into a likelihood of 
malignancy for that lesion. With respect to a number of CAD systems with a similar 
purpose and using a similar approach already discussed in the literature, the system we 
present shows the distinguishing characteristic that a robust segmentation technique 
has been implemented: it is based on a segmentation algorithm completely free from 
any application-dependent parameter. 

This paper is structured as follows: the methodology is presented in sec. [TJ sec. [5] 
describes the mammographic dataset available for our study and sec. [3] reports on the 
analysis details and on the whole system performances. 

1 Methodology 

1.1 Mass segmentation 

Mass segmentation is a quite difficult task because masses are often varying in size, 
shape and density. Masses can exhibit a very poor image contrast or can be highly 
connected to the surrounding parenchymal tissue. Thus, it is hard in many cases 
to distinguish the mass from the nonuniform normal breast tissue. Due to the high 
variability in the appearance of masses, generalizing a segmentation algorithm able to 
handle many different types of masses is a nontrivial task and much efforts have already 
gone through this issue [11-15]. 

The segmentation algorithm we developed is an extension and a refinement of the 
strategy proposed in [16] for the mass segmentation in the CAD analysis of breast tu- 
mors on sonograms. The procedure we propose is able to identify the mass shape within 
a Region Of Interest (ROI) the radiologist interactively chooses on the mammogram. 
Despite the radiologist is asked to select the smallest region containing the mass, the 
ROIs usually contain the lesions as well as a considerable part of normal tissue. Our 
segmentation method aims at removing the non-tumor regions around the tumor in a 
ROI by applying the following processing steps (see fig. H]). 

Step 1. Once the ROI containing the mass has been selected by the physician, a 
shrinking factor of 8 is applied on both rows and columns within the rectangular 
region both to reduce the high-frequency noise affecting the digitized images and 
to limit the segmentation computing time. 

Step 2. Assuming that masses become denser and denser as going from the boundary 
to the center, we took the pixel with the maximum-intensity value as the starting 
point for the segmentation algorithm (seed point). Since the ROI can even contain 
pixels belonging to the normal tissue with higher intensity values with respect to 
the pixels representing the mass, the center of the ROI is taken as the seed point 
if the distance between the maximum-intensity pixel and the center of the ROI 
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Figure 1: Mass segmentation procedure: for the explanation of the algorithms implemented 
in each single step see the text. 

exceed the 75% of the ROI half diagonal- 

Step 3. A number of radial lines are depicted from the chosen seed point to the bound- 
ary of the ROI area. 

Step 4. We scan the pixels along each radial line starting from the center to the 
boundary of the ROI. We look for the pixel whose local variance is maximal. The 
local variance is defined as the variance in a predefined n x n matrix with the 
currently processed pixel in the center. The pixel with the maximal local variance 
is considered to be most likely the boundary point between the mass and the 
surrounding tissue: it is referred in what follows as critical point. The smaller the 
size of the matrix we consider, the more sensitive to the small variances or details. 
This can be a helpful parameter for detecting the arms and branches of the stellate 
masses. Both to enhance the sensitivity of the segmentation procedure to small 
details of the lesions and to reduce the algorithm execution time, the smallest 
size of the pixel neighborhood has been chosen in the computation of the local 
variance, i.e. n = 3. 

Step 5. After scanning all radial lines and finding the critical points corresponding to 
each one of them, the critical points are linearly interpolated. 

Step 6. The region inside the coarse boundary so far identified is filled. We use the 
pixels of this region as seed points for the further steps of the segmentation 
algorithm, whose aim is to lead to a more detailed and more accurate identification 
of the shape of the lesion. 

Step 7. The steps 3 and 4 are iterated for each seed point identified in the step 6. What 
we obtain is a set of points detected from different angles to be most probably 
located on the boundary. In order to select the right thin boundary out of this 
set, we first tried to assign a vote to the pixel detected at each time as a critical 

As the ROI is a user-drawn rectangle containing the mass, the center of the ROI and the center of the 
lesion are not expected to be very different. 
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point so that the higher the credit of the critical point, the higher the probability 
of being appropriate to represent the real boundary of the mass. We found out 
that selecting appropriate criteria for assigning and thresholding the votes was 
crucial for the mass border identification and led to disconnectivity. Therefore, in 
order to avoid the unnecessary presence of free application-dependent parameters 
in this procedure we decided to accept all identified points. In this way we will 
end up with a thick and more connected border. 

Step 8. To complete the identification of the mass, the area inside the border has 
to be filled. Since we may have self-intersecting region it is more convenient to 
fill the background. To prevent entering the mass from possible disconnectivity 
in the border we use a cross-like mask for filling background. Subtracting filled 
background from ROI will give the mass. 

Step 9. A final filtering is performed in order to remove some possibly present non- 
connected objects. 



1.2 Feature extraction 

Once the masses have been segmented out from the surrounding normal tissue, a set of 
morphological and textural features arc computed in order to allow a decision-making 
system to distinguish benign from malignant lesions. The likelihood of malignancy for a 
mass can in fact be estimated on the basis of its morphological and textural appearance, 
which is usually described in terms of the mass size, shape, margin characteristics and 
x-ray attenuation (radio-density) [17-24]. 

Despite mass size alone does not predict malignancy, the size of a malignant mass 
is indicative of its progression. Therefore features like area and perimeter are usually 
included in the set of features to be computed. 

The mass shape can be round, oval, lobular or irregular. Features like: circularity, 
convexity, maximum axis, minimum axis can be useful in mass malignancy definition 
since most benign masses appear circular and convex whereas malignant cases have 
irregular non-convex shapes. 

The study of the mass margin characteristics is probably the most important in 
determining whether the mass is likely to be benign or malignant. There are five 
type of mass margins as defined by BI-RADS® [25]: circumscribed, obscured, micro- 
lobulated, ill-defined, and spiculated. Circumscribed margins are well defined and 
sharply demarcated with an abrupt transition between the lesion and the surrounding 
tissue. Microlobulated margins have small undulating circles along the edge of the mass. 
Obscured margins are hidden by superimposed or adjacent normal tissue. Ill-defined 
margins are poorly defined and scattered. Spiculated margins are marked by radiating 
thin lines. The features which can estimate the level of spiculations and evaluate the 
softness or roughness of the margin are: mean and standard deviation of normalized 
radial length, radial length entropy, zero crossing, mean and standard deviation of the 
variation ratio. 

The X-ray attenuation is a description of the density of the mass. Breast cancer 
often appears denser, i.e. whiter, than the surrounding normal breast parenchyma. 
The intensity and its variation inside the mass can be measured by features like: mean, 
standard deviation, kurtosis and skewness of the mass intensity. 

Supported by the existing correlations between the morphological and textural fea- 
tures of a mass and its likelihood of malignancy, we computed the already mentioned 
16 features on segmented masses according to the following formulas. 

1. Mass Area: it is given by the number of pixels inside the boundary of the mass. 

2. Mass Perimeter: it is measured by summing up the number of pixels on the 
boundary of the mass. 

3. Circularity: 
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where P is the perimeter and A is the area of the mass. The circularity C is 
calculated in such a way that for a mass with circular shape C = 1. Changing 
the shape to oval or irregular this number decreases. 

4. Mean of the Normalized Radial Length: 

I p 

<W = -p $>(*)> ( 2 ) 

where d(i) is the Euclidean distance from the center of mass of the segmented 
lesion to the i th pixel on the mass boundary and normalized with respect to the 
maximum distance found for that mass. P is the mass perimeter. 

5. Standard Deviation of Normalized Radial Length: 



? £(d(i)-d avg ) 2 . (3) 

i=l 

It is a good measure for irregularity. The more spiculations and irregularities are 
present, the higher is the standard deviation of the radial length. 

6. Radial Length Entropy: it is a probabilistic measure computed from the his- 
togram of the normalized radial length as follows: 

N biDa 

E = - Pk^ogPk- (4) 
fc=i 

The parameter Pk is the probability that the normalized radial length is be- 
tween d(i) and d{i) + l/iVbi ns , where iVb; ns is the number of bins the normalized 
histogram, ranging in the [0,1] interval, has been divided in (iVbins = 5 in our 
analysis). 

7. Zero Crossing: it is a count of the number of times the radial distance plot 
crosses the average radial distance. It is an indicator for the degree of spiculation 
of the mass. 

8. Maximum Axis: it consists is the largest distance connecting one point along 
the mass boundary to another point on the mass boundary going through the 
center of mass of the lesion. 

9. Minimum Axis: it is the shortest distance connecting one point along the mass 
boundary to another point on the mass boundary going through the center of 
mass of the lesion. 

10. Mean of the Variation Ratio: first we find the variation of all radial length 
from their mean value, then we determine the maximum variation magnitude 
i>ar max of radial length. Only those variations having a magnitude greater than 
var m ax/2 are considered as dominant variations. The variation ratio mean is 
computed as the average of those dominant variations. 

11. Standard Deviation of the Variation Ratio: it is calculated as the standard 
deviation of the dominant variations with respect to the variation ratio mean. It 
indicates the sharpness of the variations and can be a good indicator for spicula- 
tion. 

12. Convexity: it is the ratio between the mass area and the area of the smallest 
convex containing the mass. If the mass has a regular shape, which means it is 
convex, this number is one, otherwise it will decrease. 

13. Mean of the Mass Intensity: it is the mean value of the grey-level intensity 
values of the pixels inside the mass boundaries. 

14. Standard Deviation of the Mass Intensity: it is a measure of the smoothness 
of the grey-level intensity values of the pixels inside the mass boundaries. 
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15. 



Kurtosis of the Mass Intensity: it is a measure of how outlier-prone is a 
distribution. It is defined as follows: 



kurtosis = 



E jg(i,j) - m) 4 

a 4 



(5) 



where g(i,j) is the grey level at location (i, j), \i and a are the average intensity 
and standard deviation inside the segmented mass, respectively We can investi- 
gate how far is the intensity distribution of the mass from a normal distribution. 
The kurtosis of the normal distribution is 3. Distributions that are more outlier- 
prone than the normal distribution have a kurtosis greater than 3; distributions 
that are less outlier-prone have a kurtosis lower than 3. 

16. Skewness of the Mass Intensity: it is a measure of the asymmetry of the data 
around the sample mean. It is given by: 



If the skewness is negative, the data are spread out more to the left of the mean 
than to the right. If it is positive, the data are spread out more to the right. The 
skewness of a normal distribution or a perfectly symmetric distribution is zero. 

1.3 Classification 

Once the features are extracted from the segmented masses, one faces with the choice of 
an appropriate classification method. Lots of different approaches implemented to this 
purpose have been widely discussed in literature, such as the Minimum Distance Classi- 
fier [26,27], the K-Nearest Neighbor Distance Classifier [26], the Linear Discriminant 
Method (LDA) [13,27,28,33], the K-mean clustering [26], the Binary Classification 
Tree [29] and Artificial Neural Network (ANN) [17,30-32,34]. Neural networks are 
widely used because of their capability of simultaneously processing large amounts of 
information, for their ability in analyzing and classifying patterns even when presented 
with noisy or partial information and to adapt their behavior to the nature of the 
training data. 

Relying on the advantages of a neural approach, we implemented a supervised neural 
classifier in our CAD scheme. A standard three-layer feed-forward neural network [35] 
has been chosen to this purpose. The general architecture of this ANN consists in n 
input, h hidden and two output neurons, and the supervised training phase is based on 
the back-propagation algorithm. We used the sigmoid activation function for both the 
hidden layer and the output layer, and the on-line learning method allowing weights to 
be updated after the presentation of each pattern. Updating was synchronous, therefore 
all nodes were updated at the same time. The supervised learning is performed by 
showing the network a set of training vectors constituted by the input pattern and the 
corresponding target response. In particular, the target [1,0] has been added to the 
vectors of features extracted from malignant masses, whereas the target [0, 1] to those 
derived from benign masses. Let us assume [j/i, 1/2] to be the output of the network for 
a given input: the corresponding lesion is classified as malignant if y\ > 1/2, otherwise 
it is assumed to be benign. Whereas the number n of units in the input layer is a priori 
fixed by the choice of the number of features to by classified, the number h of hidden 
neurons has to be experimentally determined on the basis of the training dataset. 

The performances of the training algorithm were evaluated according to the 5x2 
cross validation method [36] . It is the recommended test to be performed on algorithms 
that can be executed 10 times because it can provide a reliable estimate of the variation 
of the algorithm performances due to the choice of the training set. This method 
consists in performing 5 replications of the 2-fold cross validation method [37]. At 
each replication, the available data are randomly partitioned into 2 sets (Ai and Bi for 
i = 1, . . . 5) with an almost equal number of entries. The learning algorithm is trained 
on each set and tested on the other one. The performances the system achieves in 
the classification phase are given in terms of the sensitivity and specificity values. The 



skewness = 



Ejg^j)-^) 3 
a 3 



(6) 
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Figure 2: Distribution of the mass sizes: the radius of the truth circles annotated by 
experienced radiologists are shown for malignant and benign masses. 

sensitivity is defined as the true positive fraction (fraction of malignant masses correctly 
classified by the system) , whereas the specificity is referred to the true negative fraction 
(fraction of benign masses correctly classified by the system) . 

The performances of the neural classifier were also evaluated in terms of a Receiver 
Operating Characteristic (ROC) analysis [38]. In order to show the trade off between 
the sensitivity and the specificity, a ROC curve is obtained by plotting the true positive 
fraction versus the false positive fraction of the cases (1 - specificity), computed while 
the decision threshold of the classifier is varied. Each decision threshold results in a cor- 
responding operating point on the curve, which usually goes through the points (0, 0), 
where the classifier detects no positives, and (1,1), where every pattern is classified as 
positive. 



2 Image data set 

The image data set used for this study has been extracted from a large database of 
mammograms collected in the framework of a Collaboration between physicists from 
several Italian Universities and INFN (Istituto Nazionalc di Fisica Nucleare) Sections, 
and radiologists from several Italian Hospitals [39,40]. The mammograms come both 
from screening and from the routine work carried out in the participating Hospitals. 
The 18 x 24 cm 2 mammographic films were digitized by a CCD linear scanner (Linotype 
Hell, Saphir X-ray). The digitized images are characterized by a 85/im pixel pitch and 
a 12-bit resolution, thus allowing up to 4096 grey levels. The pathological images are 
fully characterized by a consistent description, including the radiological diagnosis, the 
histological data and the coordinates of the center and the approximate radius (in 
pixel units) of a circle enclosing the masses (truth circle). Mammograms with no sign 
of pathology are stored as normal images only after a follow up of at least three years. 

A set of 226 masses were used in this study: 109 malignant and 117 benign masses 
were extracted from single-view cranio-caudal or lateral mammograms. The distribu- 
tion of the mass sizes can be observed in fig. [2J where the histograms of the radius in 
pixels of the truth circles indicating the pathological regions, as annotated by experi- 
enced radiologists, are shown for the malignant and benign cases. The diameters of 
the truth circles in real units are in the range 6.6-94.4 mm. It is worth noting that the 
size of the truth circle usually overestimates the real size of the mass. The dataset we 
analyzed can be considered as representative of the patient population that is sent for 
biopsy under the current clinical criteria. 
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Figure 3: Examples of correctly-segmented masses: two malignant masses on the left and 
two benign masses on the right. 

3 Results 

3.1 Segmentation Result 

The segmentation algorithm efficiency was directly evaluated with the assistance of 
an experienced radiologist who, after selecting the ROI to be analyzed d, assigned the 
masses automatically segmented by the system to one out of the following three cate- 
gories: correctly-segmented, acceptably- segmented or non-correctly-segmented masses. 

The radiologist has been asked to classify as correctly segmented only those masses 
whose identified boundary was sufficiently close to that she would have drawn by hand 
on the image. Despite the borders of the masses are usually not very sharp in mam- 
mographic images, the segmentation procedure we propose leads in most cases to a 
quite accurate identification of the mass shapes (see fig. [3]). In fact, 200 masses (95 
malignant and 105 benign) out of the dataset of 226 cases, were correctly segmented 
leading to an efficiency of correct segmentation ecs = 88.5%. Out of the 26 remaining 
cases, 21 masses were assigned to the category of acceptably-segmented masses, whereas 
5 masses where definitely rejected by the radiologist as non-correctly-segmented cases. 
Some examples of reasonably segmented masses are shown in fig. 01 It can be noticed 
that these masses are usually characterized by an obscured margin or they are not 
fully visible in the available mammographic field of view. In such cases the interac- 
tive selection of an appropriate ROI becomes particularly difficult and user-dependent, 
ending up with a non-satisfactory identification of at least a portion of the mass mar- 
gin. In case of non-correctly-segmented masses a too large portion of the mass margin 
is not correctly identified. If the acceptably- segmented masses are added to the set of 
correctly-segmented masses the fraction of the dataset of masses that will be analyzed 
and classified by the CAD system will reach the value ecs+AS = 97.8%. 

3.2 Analysis of the extracted features 

Despite each of the 16 features we computed on the segmented masses can potentially 
enlighten a different characteristic of a mass and contribute to a good classification 
result, we performed some tests to evaluate the discriminatory power of each feature 
and the degree of linear correlation among the different features. We restricted this 
analysis to the feature extracted from the 200 correctly-segmented masses. 

2 The current CAD GUI aiiows the radioiogist to seiect a maximum area of 1000x1000 pixeis on the 
digitized mammogram. Despite this technicai restriction could easily be removed, it does not affect the 
current analysis as the maximum actual size of the larger mass in our database is less than 1000 pixels. 
The disagreement of this statement with the histogram reported in fig. [2] is only apparent because the 
radiologist's truth circles are usually very conservative; a factor of 0.8 ± 0.3 actually occurs between half of 
the maximum mass axis (computed on the segmented masses and confirmed by an experienced radiologist) 
and the radiologist's annotated mass radius for the truth circles. 
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Figure 4: Examples of acceptably- segmented masses: two malignant masses on the left and 
two benign masses on the right. 

The distributions of the features computed for the malignant and benign cases are 
plotted in figures [5] and [of^l . The mean values of the features extracted from malignant 
and benign masses show highly significant differences (p-values< 0.01) for 13 features 
out of the 16 we computed. For one of the remaining features (the kurtosis of the 
intensity) the means are significantly different (p<0.05). The only two features not 
showing a significant difference in the mean values are the mean of variation ratio and 
the skewness of the intensity. 

The analysis of the linear correlations p(i,j) among the 16 features (see the matrix 
of the correlation coefficients in tab. HJ, lead to the following considerations: 

• the perimeter is highly correlated to the zero crossing, the maximum axis, the 
minimum axis and the area; if the perimeter is excluded from the feature set, the 
correlations among the remaining features all satisfy the constraint p(i,j) < 0.93; 

• if we set the threshold p(i,j) < 0.9 on the correlation coefficients we have to 
exclude from the set also the radial length entropy, the maximum axis and the 
minimum axis ending up with 12 remaining features. 

On the basis of these preliminary analysis on the features, we decided to select 
the optimal set of features to be finally computed by the CAD system by comparing 
the neural network performances obtained with different choices of the feature set 
cardinality. 

3.3 Classification 

We prepared 5 different train and test sets for the 5x2 cross validation analysis, by 
randomly assigning each of the 200 vectors of features to the train or test set for each 
of the 5 different trials. 

We trained 4 different sets of 10 networks by varying the number of features taken 
into account for the classification: 

a) all 16 features are considered; 

b) 15 features are considered (the perimeter, which has the higher correlations with 
other features, is excluded); 

c) 14 features are considered (the mean value of variation ratio and the skewness of 
the intensity are excluded, as they have the poorest discriminating power); 

d) 12 features are considered (the perimeter, the radial length entropy, the maximum 
axis and the minimum axis are excluded, as they have p(i,j) > 0.9 with some 
other features); 

3 Notice that the shrinking factor of 8 applied to both rows and columns of the original image has not 
been taken into account in drawing the distributions of the quantities measured in pixel units. 
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Figure 5: Distributions of the following features computed for malignant and benign masses: 
area; perimeter; circularity; mean and standard deviation of normalized radial length; radial 
distance entropy; zero crossing; maximum axis. 
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Figure 6: Distributions of the following features computed for malignant and benign masses: 
minimum axis; mean and standard deviation of variation ratio; convexity; mean, standard 
deviation, kurtosis and skewness of the mass intensity. 
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Table 1: Matrix of the linear correlation coefficients p(i,j) (expressed as percentages) 
among the 16 features. The features are numbered with i = 1, ... 16 according to the 
descriptions given in sec. 11.21 

The architecture of the three-layer feed-forward neural network we used consists in n 
input, 3 hidden and 2 output neurons, n depending on the choice of the number of 
features to be classified. We experimentally observed that the network performances 
for all choices of datasets are optimized with 3 neurons in the hidden layer. 

Since the classifier performances and the comparison among different classifiers are 
conveniently evaluated in terms of the area A z under the ROC curve, we reported in 
tab.[3]the estimated areas under the ROC curves obtained in each trial. The average A z 
obtained on each test set and the standard deviation referred to the 10 different trials 
for each set of features are reported. As can be noticed, the performances the neural 
classifiers achieve are robust, i.e. almost independent of the partitioning of the available 
data into the train and test sets. The mean A z value obtained in classifying 16, 15 
and 12 features, i.e. in the a), b) and d) cases, are indeed very similar (p-values>0.05). 
Even in the c) case, where the less discriminating features are excluded from the set 
of features, the mean A z value is not significantly different from those obtained with 
different choices of the features to be taken into account (p-values>0.05). 

We can conclude that, being smaller but as predictive as the other sets of features, 
the d) set constituted by the 12 less correlated features will be considered as the optimal 
set of features to be extracted from segmented masses to determine their likelihood of 
malignancy. 

The mean A z value obtained with this system configuration on the sets of correctly- 
segmented masses is A z = 0.805 ± 0.030. The ROC curves realizing the minimum and 
maximum A z values (0.756 and 0.849 respectively) are reported in fig. [7J where they 
are compared to the corresponding ROC curves achievable by running the CAD on the 
dataset of the correctly-segmented masses added by the 21 acceptably-segmented ones 
(column e) in tab. [2]) and finally including also the 5 non-acceptably-segmented cases 
(column f) in tab. [2]). 

As shown in fig. [7J in the d) case high values as 80-85% of sensitivity to malignant 
masses correspond to specificity values in the 70-80% range. 
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Table 2: Evaluation of the performances of the neural classifiers trained on feature sets 
with different cardinalities: the A z values obtained on the test sets according to the 5x2 
cross validation method are reported. 



3.4 The reject option 

As the segmentation algorithm leads to three classes of segmented masses according 
to the radiologist's evaluation, we considered the opportunity of exploiting this quality 
control performed on the segmentation step of the analysis to minimize the amount 
of cases misclassified by the neural decision-making system. We adopted the reject 
option [41,42], i.e. we evaluated the convenience of not assigning a class to the input 
sample (rejection of the sample), rather than risking a wrong classification. A suitable 
criterion for rejection has to reject the highest possible percentage of samples which 
would be otherwise misclassified. The reject option is based on an estimate of the clas- 
sification reliability, measured by a reliability evaluator 'J. Once a reject threshold a 
has been fixed, a sample is rejected if the corresponding value of ^ is below a. We set 
a trivial correspondence between the values assumed by the ^ function and the radiol- 
ogist's opinion about the quality of mass segmentation: = 1 for correctly-segmented 
masses, ^ = 0.5 for acceptably-segmented masses and ^ = for non-correctly-segmented 
masses. In other words, the function is directly determined by the radiologist, rather 
than being implemented as a function to be automatically derived from data. Only two 
possible classes of values for the threshold a make sense in this case: < cr a < 0.5 and 
0.5 < <Tb < 1. If c a is set as reject threshold, the correctly- segmented masses and the 
acceptably-segmented ones (corresponding to the ecs+AS fraction of the dataset) will be 
classified by the system; otherwise, if <7b is chosen, only the correctly-segmented masses 
(the ecs fraction of the cases) will be classified, whereas the remaining cases will be 
rejected. 

As reported in columns d) and e) of tab. [21 the mean A z value obtained in case 
the ecs+AS fraction of cases is classified does not significantly differ from the mean A z 
computed on the ecs fraction of cases (p=0.16). It means that it is not worth setting 
a reject threshold as severe as Ob, since it will only lead to a reduction of the rate of 
reliably classified cases, without improving the classification reliability. By contrast, 
the comparison between the mean A z value obtained on the sets including the non- 
correctly-segmented masses reported in column f) and the mean A z of column d) shows 
a statistically significant difference (p=0.048). Despite this conclusion is achieved on 
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Figure 7: ROC curves obtained in the classification of 12 features (see tab. G|) extracted 
from the datasets of: correctly-segmented masses (CS); correctly-segmented and acceptably- 
segmented masses (CS+AS); correctly-segmented, acceptably-segmented and non- acceptably- 
segmented masses (CS+AS+NAS). 

the basis of a borderline p value, the choice of <r a as reject threshold will preserve the 
system performances from a slight decrease due to the possible failure of the mass 
segmentation step. Once the reject threshold <r a is applied to the system, a small subset 
of masses will be initially rejected as the segmentation algorithm did not lead to an 
acceptable result; however, the radiologist in this case has the advantage that he can 
more safely trust the stability and reliability of the CAD performances on the remaining 
ecs+AS fraction of the cases. 

4 Conclusions and discussion 

We developed a CAD system for the classification of mammographic masses into ma- 
lignant and benign with the aim of supporting radiologists in the visual diagnosis of the 
degree of mass malignancy. Several expert systems with a similar purpose have been 
recently discussed in the literature. In the paper by Timp and Karssemeijer [15] the 
influence of the segmentation method on the performance of a CAD system was investi- 
gated, obtaining A z — 0.74, 0.72 and 0.67 for segmentation based on dynamic program- 
ming, on the discrete contour model and on region growing, respectively. Kinnard et 
al. [17] studied the efficacy of image features versus likelihood features of tumor bound- 
aries for differentiating benign and malignant tumors; a region growing technique was 
implemented in the mass segmentation and two different neural networks were adopted 
in the classification. Different combinations of these components led to A z = 0.66,0.71 
and 0.84, respectively. Mudigonda et al. [23] compared the discriminating capabilities 
of gradient-based and texture-based features. Different classification approaches on 
different sets of mammograms led to A z = 0.85,0.67,0.6,0.76,0.52 and 0.73. Hadjiski 
et al. [33] developed a hybrid classifier by combining an unsupervised model based 
on an adaptive resonance theory network and a supervised linear discriminant classi- 
fier. They reached the value A z — 0.81 for the hybrid classifier to be compared to 
A z = 0.78 and 0.80 obtained by means of the linear discriminant classifier alone and 
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a back-propagation neural network respectively. Huo et al. [24] implemented a region 
growing technique to segment masses and extracted radial edge-gradient information 
to gather the mass malignancy, obtaining A z = 0.85. Sahiner et al. [22] developed a 
three-stage segmentation method based on clustering, active contour and spiculation 
detection. They evaluated the improvement the extraction of morphological features 
can lead to a mass classification based on texture features extracted from a band of pix- 
els surrounding the mass. They obtained A z = 0.83 ± 0.02, 0.84 ± 0.02 and 0.87 ± 0.02 
on morphological, texture and combined features, respectively. They also combined 
the analysis of different views of a mass, obtaining A z = 0.91 ± 0.02. The issue of im- 
proving the classification performances in the mass diagnosis is discussed in the paper 
by Lim and Er [34], where generalized dynamic fuzzy neural networks are introduced. 
In this case, the most appropriate structure for the classifier is automatically obtained 
by means of a self-adapting of the network structure during the learning process. In 
classifying mammographic masses they obtain A z = 0.868 ±0.020. Hadjiiski et al. [21] 
exploited the interval change information to evaluate the mass malignancy. The infor- 
mation on the prior image significantly improved the accuracy of mass classification 
from A z = 0.82 to A z = 0.88. Another paper by Sahiner et al. [13] discussed the effect 
of mass segmentation on characterization: texture, morphological and spiculation fea- 
tures were extracted from masses segmented by a computerized technique and by the 
radiologist, obtaining A z = 0.89 and 0.88, respectively. Sahiner et al. in a different 
work [10] transformed a band of pixels surrounding a segmented mass into the Carte- 
sian plane. They computed and classified texture features of the transformed images, 
ending up with A z = 0.94. In another study by Huo et al. [8], three different automated 
classifiers were used to merge various features related to the margin and density of the 
masses into a likelihood of malignancy, obtaining A z = 0.94. 

Despite the area A z under the ROC curve provides a good measuring instrument to 
make a comparison among the performances of different CAD systems, the reliability 
of this measurement depends also on the dataset used to train and test the CAD. It is 
very difficult in most cases to evaluate how a particular database is well populated, i.e. 
whether it is sufficiently representative of each possible appearance of the pathology 
one aims to detect. Even the partitioning of the database into train and test sets is 
not a trivial task: one should at least a posteriori verify the robustness of the results 
achieved with respect to different data partitioning. 

The CAD system we present in this paper has been developed according to all 
these considerations. The performances it reaches are in the same range of most of the 
previously reported results. We devoted the main efforts to the development of the 
mass segmentation step. We significantly improved the procedure proposed by Chen 
et al. [16] for the analysis of sonograms, by making the algorithm iterative. In particu- 
lar, as mammographic images show a better resolution with respect to sonograms, an 
appropriate segmentation procedure needs to be particularly sensitive even to subtle 
variations occurring in the mass margins. The iterative procedure we propose is able to 
identify even very small arms and branches possibly occurring especially in malignant 
masses. Furthermore, it has the ability of handling masses with various sizes: there is 
no size limit even up to few pixels. The algorithm execution is not computationally 
expensive; computation time has linear relation with the size of the mass. In contrast 
with the analysis by Chen et al. [16] we could not find any improvement in the mass 
margin definition and consequently in the final CAD results by introducing the wavelet 
transforms in the identification of the critical points defining the mass margin. 

Despite the accuracy of segmentation algorithms is usually evaluated in terms of the 
overlap between the area segmented by the CAD and the manual segmentation of the 
mass provided by an experienced radiologist, we have not had the possibility to carry 
out such a study. To test the reliability of our segmentation procedure we adopted the 
following criterion: the radiologist partitioned the segmented masses in three classes, 
according to a decreasing rate of reliability of the segmentation result; the CAD per- 
formances were evaluated first on the dataset of correctly-segmented masses, obtaining 
A z = 0.805±0.030; then, also acceptably-segmented and non-correctly-segmented masses 
(which represent however small fractions of the dataset) were added in two steps to the 
test sets, obtaining A z = 0.787 ± 0.024 and 0.780 ± 0.023, respectively. As the diffcr- 
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ence among the A z values obtained on the fraction ecs = 88.5% of correctly-segmented 
masses and the fraction ecs+AS = 97.8% of correctly- and acceptably-segmented cases 
is not statistically significant, we can conclude that even in case the segmentation does 
not provide an extremely refined identification of the mass margin, the interplay be- 
tween the morphological and textural features extracted from the segmented area still 
leads to a reliable classification result on the 97.8% of the database. 
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